Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Authors
Abstract:
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop doesn’t consider load state of each node in distribution input data blocks, which may cause inappropriate overhead and reduce Hadoop performance, but in practice, such data placement policy can noticeably reduce MapReduce performance and may increase extra energy dissipation in heterogeneous environments. This paper proposes a resource aware adaptive dynamic data placement algorithm (ADDP) .With ADDP algorithm, we can resolve the unbalanced node workload problem based on node load status. The proposed method can dynamically adapt and balance data stored on each node based on node load status in a heterogeneous Hadoop cluster. Experimental results show that data transfer overhead decreases in comparison with DDP and traditional Hadoop algorithms. Moreover, the proposed method can decrease the execution time and improve the system’s throughput by increasing resource utilization
similar resources
An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems
The MapReduce and Hadoop frameworks were designed to support efficient large scale computations. There has been growing interest in employing Hadoop clusters for various diverse applications. A large number of (heterogeneous) clients, using the same Hadoop cluster, can result in tensions between the various performance metrics by which such systems are measured. On the one hand, from the servic...
full textAn Improved Data Placement Strategy in a Heterogeneous Hadoop Cluster
Hadoop Distributed File System (HDFS) is designed to store big data reliably, and to stream these data at high bandwidth to user applications. However, the default HDFS block placement policy assumes that all nodes in the cluster are homogeneous, and randomly place blocks without considering any nodes’ resource characteristics, which decreases self-adaptability of the system. In this paper, we ...
full textData Placement Strategy for Hadoop Clusters
Wireless technology has become very widely used; and an array of security measures, such as authentication, confidentiality strategies, and 802.11 wireless communication protocol based security schemas have been proposed and applied to real-time wireless networks. However, most of the measures only consider security issues in static mode, in which security levels are all configured when wireles...
full textHeterogeneous Neural Networks for Adaptive Behavior in Dynamic Environments
Leon S. Sterling CS Dept. & CAISR CWRU Research in artificial neural networks has genera1ly emphasized homogeneous architectures. In contrast, the nervous systems of natural animals exhibit great heterogeneity in both their elements and patterns of interconnection. This heterogeneity is crucial to the flexible generation of behavior which is essential for survival in a complex, dynamic environm...
full textIntelligent Block Placement Strategy in Heterogeneous Hadoop Clusters
MapReduce is an important distributed processing model for large-scale data-intensive applications. As an open-source implementation of MapReduce, Hadoop provides enterprises with a cost-efficient solution for their analytics needs. However, the default HDFS block placement policy assumes that computing nodes in a cluster are homogeneous, and tries to balance load by placing blocks randomly, wh...
full textA Hybrid Dynamic Load Balancing Algorithm for Heterogeneous Environments
Dynamic load balancing has become a necessity in emerging distributed environments to address the inherent heterogeneity in computing resources. The majority of traditional dynamic load balancing algorithms were developed assuming homogeneous set of nodes and suffer significant performance drop under variety of workloads. Moreover, the centralized dynamic approach limits the scalability with th...
full textMy Resources
Journal title
volume 2 issue 4
pages 17- 30
publication date 2016-12-01
By following a journal you will be notified via email when a new issue of this journal is published.
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023